{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "65704e88783c5e74",
   "metadata": {},
   "source": [
    "### 前言\n",
    "\n",
    "本篇主要介绍end-to-end的LLMops流程中的数据->SFT微调->发布->推理流程，使用的SDK版本为0.1.3。建议提前熟悉预测服务相关SDK功能作为前置知识。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "eaa80676d2e3890d",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-03-20T02:06:41.635384Z",
     "start_time": "2024-03-20T02:06:41.614171Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "\n",
    "# 初始化百度智能云的IAM ak, sk用于bos和千帆平台的鉴权\n",
    "# 大模型平台和Bos同处于百度智能云下，所以可以使用同一个AK，SK来通过权限校验\n",
    "os.environ[\"QIANFAN_ACCESS_KEY\"] = \"your_iam_ak\"\n",
    "os.environ[\"QIANFAN_SECRET_KEY\"] = \"your_iam_sk\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1127550b09a31808",
   "metadata": {},
   "source": [
    "本文使用的千帆版本\n",
    "```\n",
    "qianfan>=0.1.3\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3e9437beef13deb",
   "metadata": {},
   "source": [
    "## 数据上传\n",
    "\n",
    "在进行SFT微调训练前，我们需要准备我们的训练数据；不同的训练任务需要准备不同类型的数据集，具体来说，对于LLM SFT训练任务，需要准备的是`已标注的、非排序的对话数据集`\n",
    "推荐使用的数据格式为`jsonl`，即每一行文本都包含了一个json字符串，此json需要包含prompt，response两个字段，以下是一个示例，[下载](https://console.bce.baidu.com/api/qianfan/canghai/entity/static/sample-text-dialog-unsort-annotated.jsonl)：\n",
    "```\n",
    "[{\"prompt\" : \"你好\", \"response\": [[\"你需要什么帮助\"]]}]\n",
    "```\n",
    "每一行表示一组数据，每组数据中的prompt和response加起来之和字符数不超过8000Token（包括中英文、数字、符号等），超出部分将被截断。\n",
    "\n",
    "### Bos\n",
    "\n",
    "Bos是百度智能云提供的对象存储云服务，可以高效的存取数据。本篇教程基于Bos，实现本地的数据集到千帆平台数据集的导入："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "82f1757fb6b05f49",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: bce-python-sdk in /Users/guoweiming/Library/Caches/pypoetry/virtualenvs/qianfan-rkkovCpG-py3.9/lib/python3.9/site-packages (0.9.5)\n",
      "Requirement already satisfied: pycryptodome>=3.8.0 in /Users/guoweiming/Library/Caches/pypoetry/virtualenvs/qianfan-rkkovCpG-py3.9/lib/python3.9/site-packages (from bce-python-sdk) (3.20.0)\n",
      "Requirement already satisfied: future>=0.6.0 in /Users/guoweiming/Library/Caches/pypoetry/virtualenvs/qianfan-rkkovCpG-py3.9/lib/python3.9/site-packages (from bce-python-sdk) (1.0.0)\n",
      "Requirement already satisfied: six>=1.4.0 in /Users/guoweiming/Library/Caches/pypoetry/virtualenvs/qianfan-rkkovCpG-py3.9/lib/python3.9/site-packages (from bce-python-sdk) (1.16.0)\n",
      "\n",
      "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.3.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m24.0\u001B[0m\n",
      "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n"
     ]
    }
   ],
   "source": [
    "# 首先我们需要安装bce-python-sdk\n",
    "!pip install bce-python-sdk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "bfbc4e0703280344",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{metadata:{date:u'Thu, 09 Nov 2023 10:50:57 GMT',content_length:u'0',connection:u'keep-alive',content_md5:u'kbo1u82WYdCFGVLAbeqXbQ==',etag:u'91ba35bbcd9661d0851952c06dea976d',server:u'BceBos',bce_content_crc_32:u'86170999',bce_debug_id:u'JUrX2nUmpvcbaRPRMsY+uS3KUFDB1YjYIbZ9aaJtEgw16FpXFpCwVQG7+iVDt2rD4dVWAh+SmNZzCEUXGOXHiQ==',bce_flow_control_type:u'-1',bce_is_transition:u'false',bce_request_id:u'b65583f2-c7fb-4fa6-ad52-c07569270120'}}"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from baidubce.bce_client_configuration import BceClientConfiguration\n",
    "from baidubce.auth.bce_credentials import BceCredentials\n",
    "from baidubce.services.bos.bos_client import BosClient\n",
    "\n",
    "# 初始化bos配置\n",
    "BosEndpoint = \"bj.bcebos.com\"\n",
    "bucket_name = \"your_bucket_name\"\n",
    "\n",
    "bos_config = BceClientConfiguration(\n",
    "    credentials=BceCredentials(os.environ[\"QIANFAN_ACCESS_KEY\"], os.environ[\"QIANFAN_SECRET_KEY\"]),\n",
    "    endpoint=BosEndpoint\n",
    ")\n",
    "\n",
    "file_name = \"./data/fin_cqa_train.jsonl\"\n",
    "key = \"/data/fin_cqa_train.jsonl\"\n",
    "prefix = \"/data/\"\n",
    "\n",
    "bos_client = BosClient(bos_config)\n",
    "bos_client.put_object_from_file(bucket_name, key, file_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa69bc2afe4ab61d",
   "metadata": {},
   "source": [
    "## 大模型平台鉴权介绍：\n",
    "\n",
    "大模型平台和Bos同处于百度智能云下，所以可以使用同一个AK，SK来通过权限校验："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a16a4c8a5e2090b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "# os.environ[\"QIANFAN_ACCESS_KEY\"] = \"your_iam_ak\"\n",
    "# os.environ[\"QIANFAN_SECRET_KEY\"] = \"your_iam_sk\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e88e8ff2abb019c5",
   "metadata": {},
   "source": [
    "## 数据导入\n",
    "\n",
    "在完成了以上从本地到bos的上传过程后，我们就开始着手创建数据集并导入之前上传到bos的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47db3a86a7d55092",
   "metadata": {},
   "outputs": [],
   "source": [
    "from qianfan.resources import Data\n",
    "from qianfan.resources.console.consts import DataSetType, DataProjectType, DataTemplateType, DataStorageType\n",
    "\n",
    "# 创建数据集\n",
    "ds = Data.create_bare_dataset(\n",
    "    name=\"random_hi_sft_ds\", \n",
    "    data_set_type=DataSetType.TextOnly,\n",
    "    project_type=DataProjectType.Conversation,\n",
    "    template_type=DataTemplateType.NonSortedConversation,\n",
    "    storage_type=DataStorageType.PrivateBos,\n",
    "    storage_id=bucket_name,\n",
    "    storage_path=prefix\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c74af91aeec149a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用bos进行数据导入\n",
    "from qianfan.resources.console.consts import DataSourceType\n",
    "\n",
    "ds_id=ds[\"result\"][\"datasetId\"]\n",
    "import_resp = Data.create_data_import_task(\n",
    "    dataset_id=ds_id,\n",
    "    is_annotated=True,\n",
    "    import_source=DataSourceType.PrivateBos,\n",
    "    file_url=\"bos:/{}{}\".format(bucket_name, key)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3fa8ba2943fed15d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取数据集详情\n",
    "ds_info = Data.get_dataset_info(ds_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a4c0d039bbf87d9",
   "metadata": {},
   "source": [
    "### 监听导入状态\n",
    "\n",
    "由于数据集导入是一个耗时任务，所以我们需要等待其完成才能进行下一步的动作，这里我们通过轮询的方式简单的监听任务状态直到数据完成导入成功。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "71ceb1d2e8faa655",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current_import_status 1\n",
      "dataset import finish, ready to release\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "from qianfan.resources.console.consts import DataImportStatus\n",
    "while True:\n",
    "    # 获取数据集详情\n",
    "    ds_info = Data.get_dataset_info(ds_id)\n",
    "    import_status = ds_info[\"result\"][\"versionInfo\"][\"importStatus\"]\n",
    "    if import_status == DataImportStatus.Finished.value:\n",
    "        print(\"dataset import finish, ready to release\")\n",
    "        break\n",
    "    print(\"current_import_status\", import_status)\n",
    "    time.sleep(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3031ae9281f15d8",
   "metadata": {},
   "source": [
    "## 发布数据集\n",
    "\n",
    "恭喜你到达了进行SFT训练的最后一步，我们已经完成了数据集的准备，现在需要发布数据集。\n",
    "> Note：\n",
    "> 发布数据集后后无法再进行数据集的处理，导入或者修改！\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6401d16bf10e19d7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current_release_status 1\n",
      "current_release_status 1\n",
      "dataset release finish, ready to train\n"
     ]
    }
   ],
   "source": [
    "from qianfan.resources.console.consts import DataReleaseStatus\n",
    "\n",
    "# 发布 并监听数据集发布状态\n",
    "resp = Data.release_dataset(ds_id)\n",
    "\n",
    "while True:\n",
    "    # 获取数据集详情\n",
    "    ds_info = Data.get_dataset_info(ds_id)\n",
    "    release_status = ds_info[\"result\"][\"versionInfo\"][\"releaseStatus\"]\n",
    "    if release_status == DataReleaseStatus.Finished.value:\n",
    "        print(\"dataset release finish, ready to train\")\n",
    "        break\n",
    "    print(\"current_release_status\", release_status)\n",
    "    time.sleep(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1880e0cbbc4dc56c",
   "metadata": {},
   "source": [
    "至此，数据部分的准备已经完成！我们话不多说赶紧开始LLM的Finetune："
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9af965042102f5d7",
   "metadata": {},
   "source": [
    "## Finetune\n",
    "\n",
    "目前千帆平台支持如下 SFT 相关操作：\n",
    "* 创建训练任务\n",
    "* 创建任务运行\n",
    "* 获取任务运行详情\n",
    "* 停止任务运行\n",
    "\n",
    "### 创建SFT任务\n",
    "\n",
    "创建训练任务需要提供任务名称`name`和任务描述`description`，返回结果在`result`字段中，具体字段与API 文档一致。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "cc16e7ac0cd52d02",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'job-jk7fbup3znye'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qianfan.resources import FineTune\n",
    "# 创建任务\n",
    "create_job_resp = FineTune.V2.create_job(\n",
    "    name=\"random_sdk_task\",\n",
    "    # description=\"eb speed tasks\",\n",
    "    model=\"ERNIE-Speed-8K\",\n",
    "    train_mode=\"SFT\",\n",
    ")\n",
    "\n",
    "# 获取任务ID\n",
    "job_id = create_job_resp[\"result\"][\"jobId\"]\n",
    "job_id\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99128e3f470304b5",
   "metadata": {},
   "source": [
    "### 创建任务运行\n",
    "\n",
    "创建任务运行需要提供该次训练的详细配置，例如模型版本（`trainType`）、数据集(`trainset`)等等，且不同模型的参数配置存在差异，具体参数可以参见API 文档。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21a5d12e4e092c3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建任务运行，具体参数可以参见 API 文档\n",
    "\n",
    "resp = FineTune.V2.create_task(\n",
    "    job_id=job_id,\n",
    "    params_scale=\"FullFineTuning\",\n",
    "    hyper_params={\n",
    "        \"epoch\": 1,\n",
    "        \"learningRate\": 0.00003,\n",
    "        \"maxSeqLen\": 4096\n",
    "    },\n",
    "    dataset_config={\n",
    "        \"sourceType\": \"Platform\",\n",
    "        \"versions\": [{\n",
    "            \"versionId\": ds_id\n",
    "        }],\n",
    "        \"splitRatio\": 20\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "188c8b18f904cf15",
   "metadata": {},
   "source": [
    "这一步会监听训练`进度`，同时也观察训练任务状态，根据训练的模型大小，方法等的不同，需要一定的时间才能进行下一步模型发布。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a05be844accde86b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Running\n",
      "task status: Done\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "task_id = resp[\"result\"][\"taskId\"]\n",
    "while True:\n",
    "    task_status_resp = FineTune.V2.task_detail(task_id=task_id)\n",
    "    task_status = task_status_resp[\"result\"][\"runStatus\"]\n",
    "    print(\"task status:\", task_status)\n",
    "    if task_status != 'Running':\n",
    "        break\n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eda14584e8b34fd5",
   "metadata": {},
   "source": [
    "### 发布模型\n",
    "\n",
    "发布新模型需要指定task_id和iterationsId（job_id）；\n",
    "如果是希望进行同个模型的多次迭代更新"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "23d8a276c274bc80",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model_id: 17313\n",
      "model_str_id am-6q6dhia9rhb8\n",
      "model_version 21331\n",
      "model_version_str_id amv-avbt5d6q6xb4\n"
     ]
    }
   ],
   "source": [
    "from qianfan.resources import Model\n",
    "\n",
    "sft_task_publish_resp = Model.publish(\n",
    "    is_new=True,\n",
    "    model_name=\"random_test_sdk\",\n",
    "    version_meta={\n",
    "        \"iterationId\": task_id,\n",
    "        \"taskId\": job_id,\n",
    "    }\n",
    ")\n",
    "# 获取model_id and version\n",
    "model_id =  sft_task_publish_resp[\"result\"][\"modelId\"]\n",
    "model_str_id = sft_task_publish_resp[\"result\"][\"modelIDStr\"]\n",
    "model_version_id = sft_task_publish_resp[\"result\"][\"versionId\"]\n",
    "model_version_str_id = sft_task_publish_resp[\"result\"][\"versionIdStr\"]\n",
    "print(\"model_id:\", model_id)\n",
    "print(\"model_str_id\", model_str_id)\n",
    "print(\"model_version\", model_version_id)\n",
    "print(\"model_version_str_id\", model_version_str_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "9632247b82be0d1e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model_version_id 21331\n"
     ]
    }
   ],
   "source": [
    "# 获取模型版本信息：\n",
    "model_version_list = Model.list(model_id=model_str_id)\n",
    "model_version_id: int = 0 \n",
    "for m in model_version_list[\"result\"][\"modelVersionList\"]:\n",
    "    if m[\"modelIdStr\"] == model_str_id and m[\"modelVersionIdStr\"] == model_version_str_id:\n",
    "        model_version_id = m[\"modelVersionId\"]\n",
    "if model_version_id == 0:\n",
    "    raise ValueError(\"not model version\")\n",
    "print(\"model_version_id\", model_version_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3efc839e38009622",
   "metadata": {},
   "source": [
    "### 监听模型版本详情状态\n",
    "模型任务在训练FINISH之后，需要等待模型版本状态为READY，模型才算完全发布到模型仓库中，这一步也可以在web控制台中的我的模型/详情中看到。\n",
    "这一步会进行模型发布，保存到我的模型仓库中，根据模型的大小，可能需要等待若干分钟之后才能进行下一步模型服务部署。\n",
    "![my_model](img/my_model.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a2da6e3f37a6eab3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current model_version_state: Ready\n"
     ]
    }
   ],
   "source": [
    "# 获取模型版本详情\n",
    "# 模型版本状态有三种：Creating, Ready, Failed\n",
    "while True:\n",
    "    model_detail_info = Model.detail(model_version_id=model_version_str_id)\n",
    "    model_version_state = model_detail_info[\"result\"][\"state\"]\n",
    "    print(\"current model_version_state:\", model_version_state)\n",
    "    if model_version_state != \"Creating\":\n",
    "        break\n",
    "    time.sleep(60)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcd9a3065a7d3d02",
   "metadata": {},
   "source": [
    "### 模型评估\n",
    "\n",
    "用户可以在创建模型服务之前，先使用平台提供的模型评估功能对训练好的模型进行评估。\n",
    "我们以[金融新闻摘要数据集](https://console.bce.baidu.com/qianfan/data/dataset/15067/detail)作为我们评估使用的数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "74fe7acc659ff9e8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "eval_task_id ame-8z3garj40ywr\n"
     ]
    }
   ],
   "source": [
    "eval_create_response = Model.create_evaluation_task(\n",
    "    name=\"random_fin_test\",\n",
    "    version_info=[\n",
    "        {\n",
    "            \"modelId\": model_str_id,\n",
    "            \"modelVersionId\": model_version_str_id,\n",
    "        },\n",
    "    ],\n",
    "    dataset_id=\"ds-jun2ns7h96jzvwaq\",\n",
    "    eval_config={\n",
    "        \"evalMode\": \"rule\",\n",
    "        \"scoreModes\": [\n",
    "            \"similarity\",\n",
    "            \"accuracy\",\n",
    "        ],\n",
    "    },\n",
    "    dataset_name=\"FinCUGE_FinNA\",\n",
    ").body\n",
    "\n",
    "eval_task_id_str = eval_create_response[\"result\"][\"evalIdStr\"]\n",
    "print(\"eval_task_id\", eval_task_id_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34b1242092fe02e1",
   "metadata": {},
   "source": [
    "由于评估也是一项非常耗时的任务，因此我们同样需要监听任务状态，直到评估完成。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "fc9a084d83ae3c02",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Doing\n",
      "current eval_state: Done\n"
     ]
    }
   ],
   "source": [
    "# 缓冲，等待任务真正开始运行\n",
    "time.sleep(5)\n",
    "\n",
    "while True:\n",
    "    eval_info = Model.get_evaluation_info(eval_task_id_str)\n",
    "    eval_state = eval_info[\"result\"][\"state\"]\n",
    "    print(\"current eval_state:\", eval_state)\n",
    "    if eval_state != \"Doing\":\n",
    "        break\n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "ad4d8e42dd966c5f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'modelName': 'test_sdk_2P36k_', 'modelVersion': '1', 'modelVersionSource': 'Train', 'evalMode': 'rule', 'evaluationName': 'fin_test_gs3Lr_', 'id': '65f96b97e169f906e3109963', 'modelVersionId': 21331, 'modelId': 17313, 'userId': 1355264, 'evaluationId': 5652, 'effectMetric': {'accuracy': 0.286, 'f1Score': 0.2552728, 'rouge_1': 0.26005724, 'rouge_2': 0.1350722, 'rouge_l': 0.24951611, 'bleu4': 0.06346849, 'avgJudgeScore': 0, 'stdJudgeScore': 0, 'medianJudgeScore': 0, 'scoreDistribution': None, 'manualAvgScore': 0, 'goodCaseProportion': 0, 'subjectiveImpression': '', 'manualScoreDistribution': None}, 'performanceMetric': {}}]\n"
     ]
    }
   ],
   "source": [
    "eval_result = Model.get_evaluation_result(eval_task_id_str)\n",
    "print(eval_result[\"result\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "855188c4dae44fbe",
   "metadata": {},
   "source": [
    "评估完成后的详细评估报告目前仅支持在[网页](https://console.bce.baidu.com/qianfan/modelcenter/model/eval/list)查看"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac7f70d577555cb5",
   "metadata": {},
   "source": [
    "### 创建模型服务\n",
    "\n",
    "这一步用于创建一个在线服务，获取到service Id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "381e8eeaa757817a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from qianfan.resources.console.consts import DeployPoolType\n",
    "from qianfan.resources import Service\n",
    "\n",
    "\n",
    "g = Service.create(\n",
    "    model_id = model_id, \n",
    "    model_version_id = model_version_id, \n",
    "    name=\"random_tests\",\n",
    "    uri=\"random_sdk\",\n",
    "    replicas=1, \n",
    "    pool_type=DeployPoolType.PrivateResource\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d6dd4f76deb80366",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5729"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "svc_id = g[\"result\"][\"serviceId\"]\n",
    "svc_id"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b065d3c020e71859",
   "metadata": {},
   "source": [
    "### 部署模型服务\n",
    "\n",
    "这一步由于需要涉及到资源的服务逻辑，所以目前需要在web上操作付费，完成付费之后即可使用模型推理服务。\n",
    "![deploy_pay](img/deploy_pay.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "7dccfdc65668451b",
   "metadata": {
    "tags": [
     "cell_skip"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "svc deploy status: New\n",
      "svc deploy status: New\n",
      "svc deploy status: New\n",
      "svc deploy status: New\n",
      "svc deploy status: New\n",
      "svc deploy status: New\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Deploying\n",
      "svc deploy status: Done\n",
      "sft_model_endpoint: xxxx"
     ]
    }
   ],
   "source": [
    "# 资源付费完成后，serviceStatus会变成Deploying，查看模型服务状态, 直到serviceStatus变成部署完成，得到model_endpoint\n",
    "# 这一步涉及到资源调度，需要等待5-20分钟不等\n",
    "while True:\n",
    "    resp = Service.get(id = svc_id)\n",
    "    svc_status = resp[\"result\"][\"serviceStatus\"]\n",
    "    print(\"svc deploy status:\", svc_status)\n",
    "    if svc_status in [\"Done\", \"\"]:\n",
    "        sft_model_endpoint=resp[\"result\"][\"uri\"]\n",
    "        break\n",
    "    time.sleep(30)\n",
    "print(\"sft_model_endpoint:\", sft_model_endpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e15e9572a8110076",
   "metadata": {},
   "source": [
    "### 访问SFT模型服务\n",
    "\n",
    "在访问服务之前，首先需要配置预测服务应用的AK/SK，可以从控制台中的应用接入里获取：\n",
    "\n",
    "![app_ak_sk](img/app_ak_sk.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "196d307b1982fef6",
   "metadata": {
    "tags": [
     "cell_skip"
    ]
   },
   "outputs": [],
   "source": [
    "# 预测服务的\n",
    "os.environ[\"QIANFAN_AK\"] = \"your_app_ak\"\n",
    "os.environ[\"QIANFAN_SK\"] = \"your_app_sk\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "98bf0e3a8b1c309c",
   "metadata": {
    "tags": [
     "cell_skip"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "QfResponse(code=200, headers={'Access-Control-Allow-Headers': 'Content-Type', 'Access-Control-Allow-Origin': '*', 'Appid': '26217442', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json; charset=utf-8', 'Date': 'Thu, 26 Oct 2023 12:12:32 GMT', 'P3p': 'CP=\" OTI DSP COR IVA OUR IND COM \"', 'Server': 'Apache', 'Set-Cookie': 'BAIDUID=7FBBE2E6B5D83AB3C6A3B5F2ABB8E430:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2145916555; path=/; domain=.baidu.com; version=1', 'Statement': 'AI-generated', 'Vary': 'Accept-Encoding', 'X-Aipe-Self-Def': 'eb_total_tokens:17-id:as-j0ker4w7ar', 'X-Baidu-Request-Id': '9a143024d69419cb6e8ab8bb2d751b8e1000130', 'X-Openapi-Server-Timestamp': '1698322351', 'Content-Length': '222'}, body={'id': 'as-j0ker4w7ar', 'object': 'chat.completion', 'created': 1698322352, 'result': '我是文心一言，是百度研发的。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 5, 'completion_tokens': 12, 'total_tokens': 17}}, image=None)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import qianfan\n",
    "chat_comp = qianfan.ChatCompletion(endpoint=sft_model_endpoint)\n",
    "msgs = qianfan.Messages()\n",
    "msgs.append(message=\"你好，你是谁？\", role=qianfan.QfRole.User)\n",
    "chat_resp = chat_comp.do(messages=msgs)\n",
    "chat_resp\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f8cdaf631874afc",
   "metadata": {},
   "source": [
    "### 总结\n",
    "至此，你已经通过SFT训练成功的微调出自己的大语言模型，SFT是一个很好的手段，用于针对于特定场景下的语料进行模型特化，以增强模型在某方面的能力，非常适合对于垂直领域内的应用。除了SFT之外，千帆平台还提供了RLHF功能，SDK也将在将来持续跟进LLMOps能力。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 2414.958307,
   "end_time": "2024-03-19T10:41:30.729909",
   "environment_variables": {},
   "exception": null,
   "input_path": "/Users/guoweiming/PycharmProjects/bce-qianfan-sdk-fork/test/_temp=20240319_18:01:15/finetune/main_api_based_finetune.ipynb",
   "output_path": "/Users/guoweiming/PycharmProjects/bce-qianfan-sdk-fork/test/_output=20240319_18:01:15/finetune/main_api_based_finetune.ipynb",
   "parameters": {},
   "start_time": "2024-03-19T10:01:15.771602",
   "version": "2.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
